perf(conservative): apply weights via scipy CSR matmul (robust without opt-einsum) by thodson-usgs · Pull Request #74 · xarray-contrib/xarray-regrid

thodson-usgs · 2026-05-30T18:55:02Z

Summary

The axis-factored conservative regridder applied its (sparse) per-axis weights with a single multi-operand xr.dot(data, W_lat, W_lon, optimize=True). That contraction only finds an efficient separable path when opt_einsum is installed. Installing the conservative-2d extra (which brings sparse) without accel (which brings opt-einsum) hits the slow path: the sparse multi-operand einsum is 20–80× slower (13–29 s for a 48-step 0.5° field) and returns a sparse.COO result — a wasteful container for a dense field.

This applies each per-axis weight with a scipy CSR sparse-dense matmul instead (_csr_apply_axis, via apply_ufunc so dask / output_chunks / skipna are preserved). One unified path — the weight is compressed to CSR whether stored as sparse.COO or dense. Separability makes the sequential per-axis contraction identical to the fused one. It is fast with or without opt_einsum, keeps the weights compressed (the (n_src, n_dst) matrix is never densified — it reaches hundreds of MB–GB at high resolution), and yields a dense result. scipy is already a base dependency, so no new requirement.

Benchmark (no `opt_einsum`, eager, ntime=48)

grid	before (sparse `xr.dot`)	after (CSR)	speedup
1° → 2.5°, skipna False / True	3210 / 7137 ms	62 / 94 ms	~52–76×
0.5° → 1°, skipna False / True	13260 / 29309 ms	223 / 371 ms	~59–79×

With opt_einsum present the old path was already fast; CSR matches it. Numerical results are unchanged (sequential vs fused contraction differs only at ~1e-15).

Tests

test_conservative_conserves_known_integral — gold-standard known-integral conservation (cos²(lat)·(1.5+sin(lon)) → 4π, conserved to ~4e-7 measured with independent analytic spherical cell areas).
test_conservative_returns_dense_output — regression guard that the result is a dense ndarray.
The full existing suite (CDO comparisons, NaN handling, dask chunking, attrs) passes; verified identical results (max abs diff 0.0) on both the sparse-installed and no-sparse code paths.

Note

This effectively supersedes #49's sparse weights for the axis-factored method's apply step — the weights are still stored compressed; they are just multiplied via CSR instead of a sparse einsum. utils.overlap still builds the weight dense transiently during the build; making that sparse end-to-end is a separate, deeper optimization left for later.

🤖 Generated with Claude Code

…r without opt_einsum) The axis-factored conservative method stored its weights as `sparse.COO` (xarray-contrib#49) and applied them with one multi-operand `xr.dot(data, W_lat, W_lon, optimize=True)`. That contraction only finds an efficient separable path when `opt_einsum` is installed; without it -- e.g. installing the `conservative-2d` extra (which brings `sparse`) but not `accel` (which brings `opt-einsum`) -- the sparse multi-operand einsum is catastrophic (13-29 s for a 48-step 0.5deg field) and the result comes back `sparse.COO`, a wasteful container for a dense field. Apply each per-axis weight with a scipy CSR sparse-dense matmul instead (`_csr_apply_axis`, via `apply_ufunc` so dask / output_chunks / skipna are preserved). One unified path: the weight is compressed to CSR whether stored as `sparse.COO` or dense. Separability makes the sequential per-axis contraction identical to the fused one. This is fast and robust whether or not opt_einsum is present, keeps the weights compressed (the (n_src, n_dst) matrix is never densified -- it reaches hundreds of MB to GB at high resolution), and yields a dense result. scipy is already a base dependency, so no new requirement. Speedups without opt_einsum: ~56x at 1deg, ~60-80x at 0.5deg. Numerical results unchanged (sequential vs fused contraction differs only at the ~1e-15 fp level). Tests (tests/test_regrid.py): - test_conservative_conserves_known_integral: gold-standard known-integral conservation -- cos^2(lat)*(1.5+sin(lon)) -> 4*pi, conserved to ~4e-7 measured with independent analytic spherical cell areas. - test_conservative_returns_dense_output: regression guard that the result is a dense ndarray, not sparse.COO. mypy: add a scipy.* override (scipy ships no stubs) and annotate the new returns; error count unchanged from baseline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(conservative): apply weights via scipy CSR matmul (robust without opt-einsum)#74

perf(conservative): apply weights via scipy CSR matmul (robust without opt-einsum)#74
thodson-usgs wants to merge 1 commit into
xarray-contrib:mainfrom
thodson-usgs:perf/conservative-csr-apply

thodson-usgs commented May 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thodson-usgs commented May 30, 2026

Summary

Benchmark (no opt_einsum, eager, ntime=48)

Tests

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Benchmark (no `opt_einsum`, eager, ntime=48)